# Here is a problem taken from the "real world". # simplified approach... # In order to decide which of two colors to use on a # company web site the design team calls together 100 # people. They divide the people into two 50 person groups. # The first group is shown color one and the second group is # shown color two. Each group is asked to rate the # color they have been shown on a scale of 0 to 6 where # 0 means they dislike the color and 6 means that they # really like the color. The first group has a mean score # of 4.8 while the second group has a mean score of 4.6. # Some in the design team say "The first color is significantly # better than the second color." Others in the design # team say "The scores are too close to make that conclusion." # What can we say about this, from a statistical perspective? # First, the sampling is questionable. We do not know if # the 100 people are in any way representative of people # who will use the web site. # # After that, we could form a # null hypothesis that the mean scores for the web site # users, if we could find and query all of them, would # be the same value for the two colors. # Then our alternative is that the # mean score for all web site users would be higher for # the first color than it would be for the second color. # # Then we would turn to our t-test for the equality of # two means, hypoth_2test_unknown(). However, # to use that function we need to know the desired level # of significance and we need to know the standard # deviation of the two 50-person sample scores. # # Of these # the former is easy. We will run the test at the 0.05 # level of significance. # # What about the latter issue: # knowing the standard deviations? Let us see how those # sample standard deviations might affect the analysis. # First generate two samples, one with mean 4.8 and # the other with mean 4.6. Do this to get a large # standard deviation in each by taking almost all 5's and # then including a few 1's and maybe a 3 first <- rep(5,50) first[1]<-1 first[2]<-1 first[3]<-3 first #look at the first mean( first) sd(first) # # Then, create a second sample, starting from the first # and then changing just 3 values so that we alter the mean # value of that second sample second <- first second[4]<-1 second[5]<-1 second[6]<-3 second # look at the second mean(second) sd(second) # Now do a two population test with the null # hypothesis that the two means are the same versus # the alternative that the first mean is greater # than the second. # source("../hypo_2unknown.R") hypoth_2test_unknown(sd(first), 50, mean( first), sd(second), 50, mean(second), 1, 0.05) # the full Attd value of 0.16 means that if the null # hypothesis is true then we would get samples with # these kind of differences, or more, in the means about # 1/6 of repeated samples of this size. # Therefore, we do not have enough evidence # in these two samples to reject the null hypothesis. # Note that full Attd is meant as the attained # or achieved significance using the full degrees of # freedom. # Now, do this again, but in samples called third and fourth. # For these, however we will # make the standard deviation small by just using scores # of 5's and 4's. third <- c(rep(5,40),rep(4,10)) third # look at third mean( third ) sd(third) fourth<- c(rep(5,30),rep(4,20)) fourth #look at fourth mean(fourth) sd(fourth) # Now run the same test, but with the new samples hypoth_2test_unknown(sd(third), 50, mean( third), sd(fourth), 50, mean(fourth), 1, 0.05) # the full Attd value of 0.0146 tells us that if the # null hypothesis were true then we would get two samples # showing this difference, or one more extreme and this, # in about 1.46% of the samples. That is too rare! # Therefore, we would reject the null hypothesis at # the 0.05 level, in favor of the alternative which says # that the mean of the third is higher than is the mean # of the fourth. # Now let us look at the real data. First we can read in # all of the data. clean_data <- read.csv("josh_real_clean.csv") # then look at it. clean_data str( clean_data ) # The two groups that we are examining ar the FB color # (items 1-36 and 109-122) and the GB color (items 37-71 # and 123-137). Those values represent the responses # from the first group (all 50 people) and the second # group (another 50 people). # However, looking at the data we have a real problem. # The $Confidence score is the result of asking "How confident # are you based on the color?" The $Unease score is the # result of asking "How uneasy are you based on the color?" # These are opposite readings. The more "confident" you are # the less "uneasy" you should be. If you respond with a 6 # for both questions then it is clear that you are not being # truthful. You are just marking down answers. # The same is true for $Trust and $Untrust. Items 109-137 # are clearly responses that are so conflicted that they are # meaningless. We should ignore them # We are only interested in the values where the useif.0 # value is 0, and then we only want to look at the # Confidence scores. # The first group we want is the FB items, 1 through 36 clean_1 <- clean_data$Confidence[1:36] mean( clean_1 ) sd( clean_1 ) # the second group we want is the GB items, 37 through 71 clean_2 <-clean_data$Confidence[37:71] mean(clean_2) sd(clean_2) # Now we can run the test hypoth_2test_unknown(sd(clean_1), length(clean_1), mean( clean_1), sd(clean_2), length(clean_2), mean(clean_2), 1, 0.05) # Based on the sample that we have, excluding clearly # bad data and using the resulting means and standard # deviations, the result is that we do not have evidence to # reject the null hypothesis of "no difference between colors" # in favor of the alternative "the FB color is better than # the GB color" at the 0.05 level of significance. ######### the real case # It turns out there was a third color tested, the HB's. # Let us pull out those good items, 72 through 108. # What if we compare the FB and HB scores on confidence. clean_3 <-clean_data$Confidence[72:108] mean(clean_3) sd(clean_3) hypoth_2test_unknown(sd(clean_1), length(clean_1), mean( clean_1), sd(clean_3), length(clean_3), mean(clean_3), 1, 0.05) # In this comparison, with an attained level of 0.0115, we would # have significant evidence, at the 0.05 level, to reject the # idea that the two colors, FB and HB are viewed in the same # way in favor of the hypothesis that the FB color # engenders more confidence than does the HB color. # We could go on to test GB and HB but even taking the step above # should not have been done. What we really want to be able # to say is are all the colors the same, or is there some # difference. We should not be looking at different pairs, # FB vs. GB, FB vs HB, and then GB vs HB. The appropriate # test is called an ANalysis Of VAriance, or ANOVA, and # at this time that is beyond the material of this course.